Trading on the Edge

home *** CD-ROM | disk | FTP | other *** search

/ Trading on the Edge / Trading On The Edge - CD-ROM Toolkit (Wayzata Technology)(2031)(1994).bin / pc / mac_file / vendor_d / neuralwa / nw2v50 / automate.txt < prev next >

Wrap

Text File | 1993-08-23 | 21KB | 445 lines

Documentation for the Automate Program Automate is a NeuralWorks User-Control program used to train or test networks in an off-line batch-processing mode. It permits multiple networks to be processed (trained or tested) overnight without your direct intervention. Automate requires version 5.0 of NeuralWorks Professional II/PLUS or NeuralWorks Explorer software. Before running Automate you'll need to create it's data file 'automate.dat'. This is an ASCII file that contains your tasks for Automate to perform, and can be created using any text editor. Task summary: Automate understands the tasks (commands) listed below. Shown is an example of each task, with square brackets denoting optional entries. open mynet[.nnd] ! open an existing network savenet [newfile[.nnd]] ! save the net (or save as) learn 10000 ! train a network test ! test one pass/all recall ! recall one pass/all savebest 10000 500 10 ! savebest (train, test, auto-save) savebest 10000 500 10 .95 2 ! savebest with auto-pruning randomseed 257 ! reset the random number generator initialize ! initialize the network disable all ! disable each input PE and test disable 2 3 4 5 6 ! disable specific input PE(s) & test set learnfile train[.nna] ! change the training filename set recallfile test[.nna] ! change the recall/test filename set testfile test[.nna] ! change the recall/test filename set disable ! disable specific input PEs (no test) holdout 1 3 .2 ! PNN & GRNN tests varying Sigma Scale Program execution: The following is an outline of steps needing performed during an Automate session: (1) Create and save a network using the Explorer or Professional II/PLUS programs. Make sure your data files exist along with your network within the NeuralWorks directory. You may want to partly train or test the new network to ensure it functions properly. (2) Create or modify automate.dat using any text editor. The file should list the networks and tasks you wish to have performed. Make sure this file is saved in ASCII text mode, without word- processor formatting characters. (3) Start Automate running, perhaps waiting until later in the evening. The NeuralWorks Professional II/PLUS and NeuralWorks Explorer command line to start a User-Control program varies depending upon the machine you are using: nw2 -xuautomate ! ProII/PLUS on Unix and 286 IBM-PC nwe -xuautomate ! Explorer on Unix and IBM-PC nw2x -xuautomate ! ProII/PLUS on 386 and 486 IBM-PC nw2s -xuautomate ! ProII/PLUS on Solaris Automate is launched on a Macintosh by double-clicking on the automate application icon or name. See your NeuralWorks System Guide for more details on creating, compiling, or running User-Control programs. (4) After Automate execution, automate.dat contains 'Successful' or 'FAILURE' comments at the start of each task it processed. Checking automate.dat and nworks.err will indicate the cause of any unexpected errors. Usage Notes: Tasks can be abbreviated to their first 4 letters. Savenet and Savebest can be abbreviated to 5 letters. Before processing any tasks, Automate checks your automate.dat file for syntax errors. You are informed of any errors found and processing aborts. Automate.dat marks the lines in error with 'FAILURE' and a brief message about the nature of the error. Processing begins only if no syntax errors are found. Automate execution can be interrupted by pressing the Escape key. This halts the current task and (usually) asks whether processing should continue with the next task or be halted completely. Because the interrupted task is not completed, use this capability with caution. The Learn task automatically saves the network, in binary format, upon completion of training. The currently opened network is overwritten. You may want to make a backup copy of the original network before running Automate. Savebest also overwrites your current network and saves it in binary format. A savebest-log file (*.sbl) is automatically written while this task is running. Savebest parameters in Automate are: Learn Count, Test Interval, # Retries, Pruning Tolerance and Max # of PEs to Prune. The two pruning parameters need entered only if you wish to employ hidden layer PE pruning. See the NeuralWorks Reference Manual for more information. These parameters are space or comma delimited. Set Learnfile, Set Recallfile, and Set Testfile tasks change the name of the Learn or Recall/Test file within the I/O Parameters dialog. This enables data processing from files other than those originally saved in the network. These tasks apply only to the currently opened network, and if you save the network these 'alternate' filenames are saved in the I/O Parameters dialog. These commands should probably not be used on networks that employ a UserIO program during training and testing. The Open task clears previously entered Set commands, so any Set tasks must follow and not precede an Open task. Automate's source code file automate.c is provided should you wish to review or modify the program. If you make changes that are generally useful to others, please consider sending them to NeuralWare. We will inspect and test the modifications, and possibly incorporate them into future versions. The remainder of this document discusses two special tasks, 'Disable' and 'Holdout'. These are examples of the kind of useful tasks than can be added into the program. The Disable task: Automate includes a technique named 'Disable', which is a method of analyzing a trained network to determine the importance of each input variable. It has the same purpose as the NeuralWorks Run/Explain command, but with a whole different approach. The Disable task will disable one or more INPUT LAYER processing elements, run a Test command, and record the R Correlation measurement (from one or more Confusion Matrixes) into a file named 'disable.log'. This log file is appended to if it exists. Networks must be trained and specially setup for use with Disable, as described later. Automate.dat for a Disable run might look like this: open mynet disable 2 3 ! disable first 2 input PEs and test disable 3 4 5 ! disable 2nd, 3rd, and 4th PE and test disable all ! disable each input PE one at a time, testing at each step. Only 1 PE at a time is disabled using this task. Tasks 'disable every' and 'disable each' are synonyms for the 'disable all' task. Note that Disable does not save the network; it only opens it, disables the specified input PEs, and runs a Test command to determine the drop in output performance when this PE (or group of PEs) is not providing input. The specified PEs are marked as disabled and their Output values are set to 0.0, effectively removing this input node from the upcoming Test command. Disable results: Below is the disable.log results file produced from the above automate.dat file: ! ! network: audisabl.nnd ! sum R ConfM1 ConfM2 ConfM3 %impact ---disabled PEs--- 2.7676 0.9963 0.8605 0.9106 0.000 (baseline) 2.3815 0.9930 0.5518 0.8365 13.951 2 3 0.0739 0.6572 0.1017 -0.6850 97.328 3 4 5 2.7528 0.9887 0.8688 0.8953 0.531 2 2.3342 0.9846 0.4992 0.8503 15.657 3 2.0825 0.9073 0.5264 0.6487 24.753 4 0.8115 0.9253 0.4447 -0.5585 70.676 5 This result is from a partially trained Back-propagation network using iris_tra.nna and iris_tes.nna. The network has 4 input PEs (PEs 2, 3, 4, and 5) and 3 output PEs. The first column of figures is the sum of columns 2 through 4. Columns 2 through 4, labeled ConfM1, ConfM2 and ConfM3, show the R correlation coefficient as displayed on a Confusion Matrix instrument within the network. There is one Confusion Matrix graph per output PE. Column 5, labeled '%impact' shows the drop in network output occurring when the input PEs (shown in column 6) were disabled. The first numeric line above is a 'baseline' test, ran before disabling any input PEs. The baseline 'Sum R' is used by all subsequent tests to calculate each test's %impact. If the %impact figure is negative, it indicates that the network results improved by disable that PE (or group of PEs). The calculation for %impact is: %impact = 100.0 - (newTestSumR / BaselineSumR * 100.0); For example, disabling only input PE 2 (the first input PE) shows a much smaller drop (0.531%) in the network's predictive or classifying capability, relative to when PE 5 was disabled (70.676%). You might infer from this that PE 5's inputs are substantially more important to the network, in a global sense, than are PE 2's inputs. Closer inspection of the entry for PE 5 also shows a much smaller affect on the first output class than on the second and third outputs. If Disable results (or results from the Run/Explain command) indicate that one or more inputs can be deleted with minimal impact the network's performance, you should re-train a new network without these inputs. This will verify whether the inputs are indeed unnecessary. For both Disable and the Run/Explain command, you might wish to run separate tests using (1) the training file, (2) a test data file, and/or (3) a combined file of training and test data. The combined file is likely to show a more global picture of things than when testing only a subset of your data. Network setup for Disable: The Disable task relies upon Confusion Matrix instruments to calculate the R correlation value, and expects this information to be found in a temporary file named 'disable.nnp'. This is accomplished within the network (after it is trained) by doing the following: (1) For each output PE you are interested in monitoring, create a Confusion Matrix instrument for that PE. Using the Graph/EasyProbe icon, select "Output Value", "Selected PE", "Transform across Epoch", "ConfM", and okay the dialog. Apply the graph to the output PE of interest, and repeat the process for all output PEs you are interested in monitoring. (2) Edit each Confusion Matrix graph. Select the "logging active" button, and enter "disable" into the filename field. The "append" selection should be used, and not "write". This will cause all graphs to record their R values into disable.nnp. (3) VERY IMPORTANT: Unless you want very misleading results, make sure that multiple Confusion Matrixes are edited in the 'proper order'. Graphs plot in the order they have been accessed in, and so the logged values within disable.nnp will be in this same order. After setting up the graphs and editing their contents, perform a Graph/Edit (or double-click) on each graph in sequence, starting with the graph for the first output layer PE, then the second graph, etc. Simply bring up the Graph/Edit dialog and okay it. If in doubt, press the Escape key and the screen will repaint. Watch the order that the graphs appear; they should be in a left-to-right fashion. (4) Do step 3, or the results will be very misleading and you probably won't realize anything is wrong. Specifying PE(s) to disable: Only Input layer PEs can be disabled using the Disable task. Up to 50 PEs can be disabled at one time, separated by spaces or commas. The first PE in an Input layer is labeled "2" on the screen, and is known as PE 2 (the "Bias" PE is PE 1, and it is not in the Input layer). Entering the number 2 into the PE-list therefore disables PE 2, the first Input layer PE. Likewise, entering 100 disables PE 100, the 99th Input. The Set Disable task: 'Set Disable' is a special task with rather esoteric use. Suppose that after running Disable as shown above, you believe inputs 1 and 2 (PEs 2 and 3) are not contributing much to the output of the network. To verify this, these PEs need to be disabled, and the network needs to be initialized and retrained. You can either disable the PEs manually within the Explorer or Professional II/PLUS using PE/Edit, or you can use 'Set Disable'. This setting tells Automate to disable the PEs, without running any tests. It is assumed you will then Initialize and Train the network, or 'Savenet' the network. Automate.dat for this task looks like this: open mynet set disable disable 2 ! could be one command 'disable 2 3' disable 3 savenet mynet2 The Holdout task: This task exists only for use with Probabilistic Neural Network (PNN) and General Regression Neural Network (GRNN) paradigms, and will not work for any other network type. Due to these paradigms using 1 hidden layer PE for every training data case, and because they are often used when few data cases are available, they lend themselves to a testing technique known as the 'hold-out' method. During 'hold-out', the network is trained with all data cases except one which is used for test purposes. The network results are recorded, and the network is trained and tested holding out a different data case. The process repeats for each data case. Although this method may be valid for any paradigm when few data cases are available, the topology of PNN and GRNN networks makes this method very easy to automate, and does not even require re-training the network. After initially training the network, only one pass through the data set is required while disabling the appropriate 'pattern layer' PE for each data case. Automate's Holdout task differs a bit from the above description. It is not primarily intended for use in situations with few data cases. Instead it is concerned with optimizing one of the L/R Schedule coefficients, the Sigma Scale setting. Holdout requires three parameters (as explained later) to explore various Sigma Scale settings, and produces results that indicate the near-optimal value for this coefficient. Your network should be trained before using Holdout, and it must not be using a UserIO program for training. Holdout works by running Test commands using single data cases extracted from your training data file. Your network may have a separate Test data set, but the Test data is not used by Holdout. Internally, Automate looks only at your training data. Holdout parameters: The Holdout task parameters explore various Sigma Scale settings. You supply the start, stop, and step size for this setting, and Automate performs one Holdout task for each setting. The three arguments are entered space or comma delimited. If the first parameter (the starting Sigma) is entered as a negative value, then the start and stop Sigma becomes an offset from the network's current Sigma Scale setting. A positive starting Sigma is considered an absolute range; negative is an offset, relative to the current Sigma Scale setting. For example, a task input line of 'holdout 1.0 3.1 0.5' will test six different Sigma Scale coefficients {1.0, 1.5, 2.0, 2.5, 3.0, 3.1}, regardless of the Sigma Scale setting currently set within the network. A task input line of 'holdout -1.0 3.1 0.5' tests ten different Sigma Scale coefficients. If the network's current setting is 4.0, then the tested settings are {3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.1}. The minimum Sigma Scale tested is 0.0001, which is set by Automate if your range tries to set a smaller value (Sigma Scale cannot be less than or equal to zero). Holdout task results: The Holdout task causes Automate to perform a series of hold-out tests rather quickly, and outputs an overall measure of the network's performance into an output file named 'holdout.log'. This file is appended to if it already exists. For PNN networks, this measure is the 'average classification rate', the same measure as shown by our Classification Rate instrument. This instrument counts the percent-correct for each output class, and reports the average of these scores. A value of 0.0000 is terrible, and 1.0000 is a perfect score on all classes. Further documentation for this instrument can be found under the Graph Palette in the NeuralWorks Reference Manual. The measurement employed for GRNN networks is the root-mean-square (RMS) of all errors. A lower RMS error is better than a higher one. Holdout.log results indicate whether you should continue to increase or decrease the network's Sigma Scale setting, or whether you should run additional experiments with smaller step sizes (increments) to explore a more localized space. A sample automate.dat file for Holdout is shown below: open my_pnn holdout .5 1.5 .2 The resulting holdout.log file is shown below. The results indicate that the optimal Sigma Scale is somewhere between 0.5 and 0.9, and so additional Holdout tasks should be run in the range of 0.5 to 0.9, with a smaller step size. Holdout testing of network my_pnn Sigma Scale varies from 0.5000 to 1.5000, stepped by 0.2000 avg classification rate = 0.7533, sigma scale = 0.5000 avg classification rate = 0.7936, sigma scale = 0.7000 avg classification rate = 0.7655, sigma scale = 0.9000 avg classification rate = 0.7412, sigma scale = 1.1000 avg classification rate = 0.7412, sigma scale = 1.3000 avg classification rate = 0.7384, sigma scale = 1.5000 At completion, Holdout places the 'best' Sigma Scale value back into the L/R Schedule. Holdout does not alter your saved network, unless automate.dat includes a Savenet task. Saving the network then writes the network with only the Sigma Scale coefficient being different than the original network. And finally, consider the following automate.dat file: open my_pnn holdout .5 1.5 .2 holdout -.4 .4 .05 holdout -.1 .1 .005 savenet my_pnn2 The first Holdout task causes Automate to test between 0.5 and 1.5 as shown earlier, and a new Sigma Scale coefficient of 0.7 was placed into the L/R Schedule (because 0.7 generated the highest average classification rate). The second Holdout task starts with 0.7, offsets this (because of -.4) and tests the range 0.3 to 1.1 with a smaller step size, again placing it's best Sigma Scale into the L/R Schedule. The third Holdout task potentially refines it even further. The Savenet task saves the network as 'my_pnn2.nnd' with a fairly optimal Sigma Scale setting. If the Holdout task fails (i.e. 'FAILURE' is written into file automate.dat), check nworks.err and also the holdout.log file to determine the cause. Automate will run faster during Holdout tasks if the test data set is loaded into Ram, and also if all on-screen graphs are turned off during Recall mode. For more information on the 'hold-out' method and the Sigma Scale coefficient, see the PNN and GRNN chapters in your Neural Computing Guide, and the on-line help screen for the InstaNet menu for these paradigms. End of automate.txt